home *** CD-ROM | disk | FTP | other *** search
- Path: news.infi.net!usenet
- From: nngis@norfolk.infi.net (Greg DiGiorgio)
- Newsgroups: comp.lang.c
- Subject: Re: Sorting large files
- Date: 25 Jan 1996 20:44:24 GMT
- Organization: Customer of InfiNet
- Message-ID: <4e8q38$4va@nw002.infi.net>
- References: <4e8j9b$cuf@longwood.cs.ucf.edu>
- Reply-To: nngis@norfolk.infi.net
- NNTP-Posting-Host: h-standbyme.norfolk.infi.net
- Mime-Version: 1.0
- X-Newsreader: WinVN 0.99.3
-
- In article <4e8j9b$cuf@longwood.cs.ucf.edu>, schnitzi@longwood.cs.ucf.edu
- says...
- >
- >There was some discussion here a little while
- >back on how to sort the lines in a large file
- >without having to have a huge character array.
- >I suggested using ftell and fseek to hunt down
- >the particular lines you are comparing. Only
- >just the other day did I notice (via a web
- >search engine) that someone posted a question
- >on how this could be done... So here goes.
- >
- >... snipped ...
-
- just throwing my 2 cents (perhaps 1.2 cents) in...
-
- Sorting large files is a problem, especially if you try to do them in
- memory. On mainframes, one buys an entire pkg devoted to sorting. In
- UNIX, you use the "sort" cmd. RDBMS have to make use of optimized sorts
- to provide fast reponse. Either way, I can not envision sorting large
- files without temp work files to hold intermediate results. Let's assume
- you have 1 million records of 100 bytes to sort - that's 100M of data.
-
- Based on the number of records to sort, you could divide the input file,
- say, into sets of 10,000 records. Sort each set into a work file. After
- accumulating a number of sets, perform a merge-sort on those work files
- into a single new work file. Del the other work files. Move on to the
- next set of work files, merging them into another work file. Merge the 2
- sorted work files and del the merged files. Do this until you have
- completely sorted the data file.
-
- The memory-based sort you use is your choice. Mainframe pkgs sample data
- before sorting to choose the most optimized sorting method to use. I
- mean, you have bubble sort and insertion sort on the low end and quick
- sort & heap sort on the high end and probably 50 other sorting algorithms
- I don't even know about.
-
- Sorting data is such a technical issue, that mainframe RDBMS are even
- resorting to sort progs implemented in hardware to speed this often-used
- bottleneck.
-
- Good luck on your course of action,
- Greg DiGiorgio
-
-